The purpose of Section V is to reflect upon the learnings of this research and make them actionable and well-contextualised to future designers, researchers and innovators. Chapter 10 reflects back upon the thesis to produce a set of clear and actionable principles for better Human Data Relations as well as pragmatic and personal reflections on this body of work and the agenda it advocates, as well as positioning the PhD in terms of its potential legacy and limitations.
“Our research should transform, not just inform, society.” —Kingsley ofosu-Ampong (researcher & lecturer in digital transformation)
This Digital Civics PhD has explored, from an constructivist, individualist, and pragmatist point of view, the nature of the power imbalance between individuals and those who hold data about them. It has built upon literary bodies of work in multiple disciplines including information theory, data rights legislation, surveillance capitalism, Personal Information Management, Human-Computer Interaction, Human-Data Interaction, Personal Data Ecosystems and Human-centered Design. In doing so, it identified a research gap around the lived experience of today’s data-centric world, recognising that there was a need to explore the role that personal data should play in people’s lives, and how the current power imbalance over personal data [2.1.2] affects people’s attitudes and capabilities.
Through the Case Studies in Section II it explored this research gap, reaching in Chapter 6 within Section III a clear answer to its research question [2.4] in terms of six wants that people have in their direct and indirect data relations; people want:
These six wants provided a basis for envisioning a world where people become empowered digital citizens with mastery of their own personal data.
Through its unique two-track approach of real-world project placements taking place alongside the participatory research, as described in Section IV, this evolving vision for better relations with and through data was tested, refined and scrutinised through a practical, sociotechnical lens that began consideration of a further question, of how the world might be changed towards this new vision. The result of this hybrid learning was presented as a new research agenda, Human Data Relations (HDR). HDR was defined in Chapter 7 within Section III, culminating in the expression of four clear objectives:
Through Section IV, this new research agenda was given flesh, by mapping out the understood obstacles to progress when pursuing these objectives, and providing four strategic trajectories for pursuing societal change towards better HDR:
What remains for this chapter to address, is to distil the learnings of what people want from data in their lives (and how their wants might be met) into a set of principles from which future researchers, designers and activists might be able to generate new research projects, social interventions, policies, or other initiatives [10.1], before reflecting critically on the research agenda [10.2] and upon this body of research in terms of its limitations [10.3], personal impact upon the researcher [10.4] and potential legacy [10.5].
In line with this researcher’s objectives [1.1.1], this PhD set out to have an impact upon the world with regard to tackling the power imbalance over personal data. Indeed, through publications, methodological contributions and industry adoption [1.2; 1.3] it already has. In order to maximise potential impact, this thesis will conclude not by providing a prescriptive roadmap for HDR, but by sharing a versatile set of principles that can be applied by all who wish to pursue this research agenda.
The principles are framed as principles for generative action, which refers to the idea of producing solutions that themselves stimulate further solutions, in line with the concept of generativity, that in psychology refers to the creation of something novel, valuable and meaningful (McAdams, 1996), and in sociology refers to a stage of adult development focused on productivity, creativity and contributing to society (Slater, 2003).
In recognition of the fact that the power imbalance over personal data is a sociotechnical problem, these principles are broader than just design insights for influencing UI design or software system specifications; they could also be applied at a societal level to influence funding and policy decisions, social interventions, business strategy, grassroots activism and more. As such, they are a tangible and actionable output of this research, routed in human-centric philosophy, participatory findings and sociotechnical reality.
In the pilot study and Case Study One, data cards were used to represent civic data [Figure 3.7]. In Case Study Two and in Hestia.ai’s digipower investigation [See Section IV Introduction], a categorisation of provider-held data was displayed [Figure 3.8]. In my BBC research report (Bowyer, 2020), the use of relatable examples was identified as an important way to help people understand what a piece of data represents. Recalling that to make data meaningful, we must be able to interpret it as information [2.1.1], this can be refined further:
To make data meaningful, it needs to be expressed as information about your life.
Spreadsheets and ‘big data’ sound dry and (to many) dauntingly technical, but once those same datapoints are expressed as ‘facts about your life’, the hurdle of relatability is overcome [4.3.1]. The effectiveness of applying this principle is evident in successful online services like Netflix, Spotify and Strava, and in social media platforms like Facebook: these interfaces show understandable everyday concepts like Friends, Events, Movies and Playlists, not files, records, folders or database rows. They have successfully ‘pushed the technology into the background’, in line with Weiser’s vision (Weiser, 1991) and Rogers’ calm computing. While exploring this idea of representing life concepts further at BBC R&D, I produced Figure 10.1, which shows a near-exhaustive overview of the many different informational concepts in an individual’s life that providers might hold as data:
This diagram shows how most common personal data types handled today can be mapped to more relatable life information [7.1] concepts. These life concepts (exemplified where possible) can make data meaningful to individuals, and can help them find value in their data [5.5.3].
It is clear that better HDR involves recognising the splintered and scattered reality of where our data is [Lemley (2021); [8.2.1]] and moving beyond it. To make data useable11 for individuals, the diaspora must be united. This means that data from different sources must first be united—brought together—and then unified, which means making it into a collection of data about the individual and their life, rather than scattered slices of company data that may have secondary value to the individual. This is a multi-faceted sociotechnical problem of access, interpretation and integration [Li, Forlizzi and Dey (2010); 2.2.3]]. Negotiability remains important; we can only unite data that we can access, and only data holders can fully explain it [see 8.3 and 8.4]. Setting that aspect aside, the pragmatic way forward begins with creating a space where data can be held, combined, controlled and owned by the individual - ‘a place for your personal data’ [Jones (2011); [2.2.4]]. This can form the seed of their new human-centric personal data ecosystem. This follows Bergman’s subjective classification principle, as mentioned in 2.2.2:
‘All related items should be classified together regardless of technological format’—Bergman, Beyth-Marom and Nachmias (2003)
We could add: ‘regardless of where they are held’. This vision is embodied in the Personal Data Stores (PDS) concept [2.3.4]. A PDS can brings together personal data from multiple sources that has never co-existed before. This enables the provision of new capabilities over one’s digital life. The BBC R&D Cornmarket project [See Section IV Introduction] examines how to build PDSs. In 9.4 I explored possible design approaches. At this stage, only the concept is important. Once data is united and unified, PDSs enable the creation of new views of data that were not previously possible, because code can execute across data that was previously dispersed. For example, today each separate TV app, device or streaming service maintains separate records of what you have watched. Once unified in a PDS, it would be possible to present you with a unified view of all the past content you had viewed, across all channels, as this mock-up I made during my BBC internship shows:
In Case Study Two [Table 5.4; Bowyer, Holt, et al. (2022), supplemental materials], participants expressed diverse goals for personal data, including reflection, pattern-finding, goal-tracking, and creative use. In the PIM space [2.2.2] relevant innovations include associative exploration, spatial arrangement, and embodied interaction for different contexts) Drawing on all of these, allows me to infer that unified data must be transformed into a versatile material. Individuals need to be able to use data—represented as facts or assertions about one’s life by performing manipulations such as:
Data as material will be new to most except data scientists. This is novel not just for end users but for designers too. Eva Deckers, in her work on data-enabled design, an approach to design which also calls for data to become a material, notes (and we could expand this to laypeople too):
“Designers are in general not trained and prepared to work with data. They’re not equipped with the right tools. Data manipulation is not part of the schools’ curriculum and designers are rarely interested in understanding data.”—(Deckers, 2018).
Her work with colleagues on the ‘connected baby bottle’ illustrates how treating data as a design material creates a novel iterative user-centred development of new capabilities (Bogers et al., 2016). The HDR principle of data as versatile material is distinct from prior conceptualisations of data materialisation as the making of physical objects that manifest data (Beghelli, Huerta-Cánepa and Segal, 2019), but rather refers to the realisation of human information of a pliable digital substance which users can manipulate for themselves in order to gain new insights and exercise new capabilities. In HDR terms, I theorise that what the material should be is human information - life information and ecosystem information [7.4]. To achieve data useability [6.1.3] therefore implies a call for the creation of systems that enable human information to be treated as a material.git
In 2014, Microsoft official Craig Mundie said:
“Today, there is simply so much data being collected, in so many ways, that it is practically impossible to give people a meaningful way to keep track of all the information about them that exists out there, much less to consent to its collection in the first place.”_ – Craig Mundie, writing in Foreign Affairs magazine (Mundie, 2014)
Based on what I have learned in the formation of the Human Data Relations agenda, such an attitude is defeatist. It is certainly possible to give people an understanding of a large part of what is happening in their digital lives. And I believe that the concept of ecosystem information 7.1 is central to this. My research showed that acquiring ecosystem information and understanding is a key motivator for many people—encompassing 74% of participant goals in Case Study Two [Table 5.4]—and is essential for better HDR. This suggests two distinct goals for system builders: ecosystem detection and ecosystem information display as ingredients to help overcome the obstacle. As a representative example let us examine a recent app called SubsCrab [Figure 10.3]:
This app connects to the user’s e-mail account, and searches it and monitors it for e-mails from service providers such as Netflix, Spotify, Dropbox, or Google with which the user has monthly or annual subscriptions. In doing so, it is detecting part of the user’s ecosystem. It is identifying which companies they have a payment relationship with. It parses found e-mails to identify billing dates and payment amounts. It then provides additional representations of that ecosystem information to the user, so that they might get on top of their subscriptions, see what they need to pay (or cancel), and feel more ‘in control’ [Teevan (2001); 2.2.2] of this aspect of their digital life. From this example, it is easy to imagine other types of ecosystem detectors, which could detecting relationships with free services and websites, identify account numbers and e-mail addresses, password resets, address book syncs, OAuth logins, family identities and more. Alistair Croll and I explored possibilities for inbox scanning in 2009 (Croll and Bowyer, 2009), and while there has been some innovation in this space, it has largely been done with a commercial focus (Braun, 2018). New ecosystem detectors could power new interfaces, contributing to the simplification of the user’s digital life. This would give people more visibility and control over their previously unmanageable data ecosystem.
A secondary consideration in achieving the required ‘sea change’ in approaches HDR, is that current PDS and SI approaches are very life-information-centric. It is implicitly assumed that the only way to unite data is to collect it. The difficulty in such an approach is that you can only collect that which you can extract. To address this, I draw inspiration from a computer programming concept known as pass by reference (as opposed to pass by value) (Ananya, 2020) where data is ‘pointed to’ rather than moved. Productivity guru David Allen recommends the use of ‘placeholders’ (Allen, 2015) to keep track of tasks you cannot otherwise bring into your planning. To build a complete map of a user’s ecosystem we must be able to keep track of accounts and data that are remote, much like a search engine points to information on different pages around the web. We can create proxy representations of service-provider-held or otherwise immobile data (e.g. data which is offline or restricted). These representations become part of the manipulable material in the user interface, and could be augmented with links to visit the remote source.
Metadata is what gives information context. Context is critical to sensemaking [2.2.3] and enables good experience-centred design [2.3.2; 2.3.3]. Without context, data loses meaning [5.5.3]. Collecting historical data about the individual is important for reflection [2.2.3] and considered valuable [4.4.3], but knowing the history of a piece of data allows its context to be understood. Data is not neutral, and is inherently biased, since it was created for a specific purpose with a specific agenda in mind (Gitelman, 2013; Neff, 2013). To combat this bias, more context is needed. Significant research in this space has been undertaken by Professors Mike Martin and Rob Wilson at Northumbria University, formerly Newcastle University, who promote the idea of data with provenance; in other words:
Data must carry with it the details of why it exists, how it came to be, and what has happened to it since its inception.
Provenance should be communicated alongside any visualisation of the data, in order for it to be fairly assessed in context. Provenance is essential for data to be trusted, argues Martin, and should be quite granular: a piece of data should be attributed not just to an individual or organisation, but to the relationship between role-holding individuals in a specific context. Greater insights can be gained when considering all actions upon data as motivated communications from one party to another; only by capturing this information in-situ can the data’s meaning be fully appreciated (Martin, 2022). This framing essentially advances the concept of history tracking [2.2.3] into the sociotechnical, ecosystem-aware problem space. While everyday system designs have not approached this level of granularity, the importance of data provenance has been recognised in the PIM space. Temporal PIM systems [2.2.2] from Lifestreams (Freeman and Gelernter, 1996) to activity streams (Hart-Davidson, Zachry and Spinuzzi, 2012) rely upon data provenance in some form. A study by Jensen et al. concluded that provenance tracking can be valuable for identifying related documents, a critical part of knowledge work today (Jensen et al., 2010). Lindley et al. proposed the idea of file biographies, which view the lifetime of a file as something that should remain connected, so it could be traversed in order to understand the context of the file different moments of interaction (Lindley et al., 2018). This comes close to Martin’s vision but does not capture the motivation for each interaction. While provenance capture is not a solution in its own right to the understanding of data and of ecosystems, it is clear that data with provenance is very likely to be a valuable part of any design that aims to provide understanding of complex and invisible personal data ecosystems.
At Hestia.ai [See Section IV Introduction] we produced produced a model to explain the mechanisms by which technology companies gain power and use it to shape today’s digital landscape. In this model, infrastructural power comes from three things:
As organisations (especially platforms such as Google or Facebook) collect more data, and grow in market influence or technical capability, they gain power over individuals and over other organisations. They exert this power using four ‘levers’. Simplified and expressed in the terms of this thesis, and shown in Figure 10.4, these are:
The precise mechanisms and techniques employed when exerting infrastructural power, as well as the social and market consequences of these practices, are explored in detail in Hestia.ai’s digipower technical reports, of which I was a co-author (Bowyer, Pidoux, et al., 2022; Pidoux et al., 2022).
The research highlights that platform providers’ power is far greater than many realise. Unlike in the physical realm, providers of popular online platforms can reconfigure the landscape to change the way that individuals perceive reality, in line with the powers of interpretative influence, behavioural influence and socially shaped power described above (Bowyer, Pidoux, et al., 2022). Providers control the extent to which (if at all) data stored behind the scenes, and internal processes that use that data, are visible, and how data and processes are represented. Common to the last three levers is an implicit, if not malicious, dishonesty; all three rely upon changing other actors’ perceptions and understanding of the landscape, such that they might be more likely to act in the platform provider’s interest.
The model shows that the accumulation of data (and hence, information) is implicitly and objectively a form of power, consistent with participants’ observations in 5.5.4. As long as current service providers are free to collect so much personal information, the information landscape is likely to remain imbalanced and individuals will not be able to acquire ecosystem negotiability.
This insights shows that the most powerful data holders exert huge influence over the digital landscape, in terms of what is knowable and what is doable. HDR reformers’ abilities to balance the landscape are hindered by the fact that they are operating in a landscape that the incumbent platform and service providers effectively control.
This principle aims to combat the challenges outlined in 8.5, that there is a lack of individual demand and a question over the commercial viability of human-centred information systems. Given the earlier observation that life information is what makes data relatable, the insight I offer here is that the way to make people care about their data is to use it to help them in their life. The way people find value in data is to connect it their lives. The more that people see relatable life information and can imagine ways to harness that information in their everyday life, the more motivated they are likely to be. By starting with a focus on a user’s world, one can then focus in on their life, and then the data that represents elements of that life. Then, the individual has a vested interest; the abstract becomes meaningful and the theoretical becomes possible. Systems and features should be designed from this life-centric perspective, identifying what people want to achieve and designing around these needs. This is known as value-centred design (Reber and Duffy, 2005) and it has been argued that this should become the guiding design philosophy in HCI (Cockton, 2004). And to offer true individual value, all human-centric system designs must also consider context [2.3.2], environment (Abowd, 2012) and experience [3.2.1].
However, a key consideration for design in this space is that any technology that handles or represents personal data is designed in a way that is value-sensitive. This is distinct from value-centred design, which focuses on shaping a technology design directly around the needs of the user it serves. As Batya Friedman explains (Friedman and Hendry, 2019), value-sensitive design is about ensuring that human values are incorporated into the design thinking behind every feature of a system.
BBC R&D explored how to better connect people with their data, and in doing so gained clarity on what those values might be. A research project was carried out (Forrester, 2021) that identified fourteen specific Human Values that people seek to satisfy in their lives, which are shown in in Figure 10.5. These are, at the most abstract, goals that people care about in their daily existence.
Applying the principles of value-sensitive design, it will be critical to keep these desires in mind when desigining human-centred information systems. If a design does not help with one or more of these values, it will be less likely to succeed. But conversely, these values may offer designers a way to combat the indifference and apathy of users described in 8.5.1.
Moving from design philosophy to design practicalities, how can we begin to design to a human-centric information system that people want to use? To help with this design challenge, I look to business modelling, where a tool is called the value proposition canvas is used. This identifies three ways of conceptualising value: gain creators, pain relievers and jobs-to-be-done. Against this framing, we can design better human-centric functionality that relieves an individual’s pain points, helps them complete their tasks, or offers them some gain over the status quo. In the HDR space, given the lack of existing tools for digital life management, we have the opportunity to create quite a unique type of gain: new capabilities over your digital life that you have never had before. This ability to do new things has been identified as key ingredient of user empowerment (Meschtscherjakov, Wilfinger and Tscheligi, 2014; Schneider et al., 2018). As 2.1.4 and 2.2.2 showed, a range of novel capabilities are needed for effective PIM.
Here is an example of what this value-centric approach might look like in the HDR space: Myself and BBC R&D colleague Jasmine Cox imagined focusing on address books and contact lists as a strong relatable starting point to generate demand for a human-centric interface. This could provide people with new life capabilities while also relieving pains. Many people have address and contact information scattered far and wide, and face a complexity they cannot easily manage when it comes to the automated syncing and sharing of potentially sensitive contact information between devices, apps and providers. Developing human-centric personal information management capabilities to bring that messy situation under control would offer a clear and tangible benefit to users. Figure 10.6 shows a possible strategic path built around this need that we mapped out, beginning with detecting ecosystem and life information from the individual’s calendar and e-mail inbox, through to building up to more holistic life-level PDS capabilities.
A helpful example is that of a vacation from my 2011 article (Bowyer, 2011) and shown in Figure 10.7. Today, all the information around such a holiday is scattered into multiple systems - emails, online provider bookings, chat logs, cloud synced photos, web browser bookmarks, smartphone location logs, etc. It is not hard to imagine that a system that was able to bring all related information about that vacation together in one central interface (mock-up in Figure 10.8) could deliver huge value to users and be very compelling. Such context-targeted human-centric offerings can have a much greater chance of generating interest and impact than offerings that merely allow you to ‘organise your data’ or some other abstract phrasing not rooted in everyday life.
In order to move towards standardised ways to store and unify personal data from multiple sources, computer systems must be taught to understand the information within the data, and how it relates to an individual and the world. This moves beyond just capturing data provenance: put simply, computers need to understand human information. They need to move beyond files (Bowyer, 2011) and databases, and begin to perform operations on human informational concepts, and to associate those concepts according to what they mean - i.e. semantically. This is a preliminary step that will enable the building of systems and interfaces that are able to deal in human concepts and represent the elements of everyday life.
We need to store semantic context and semantic associations, i.e. the meaning of things, not just raw bundles of data. This is advocated by the Web’s inventor Tim Berners-Lee in his vision of a Semantic Web (Berners-Lee, Hendler and Lassila, 2001) and by other proponents of networked and semantic PIM systems [2.2.2]. There is a need to develop standard ways to digitally model facts and assertions about users’ lives, so that those disparate pieces of data can be unified, connected, correlated and compared. Some standards are emerging, such as data shapes (‘ShapeRepo: Make your apps interoperable’, 2022). The extraction of meaning from data has a business domain all of its own. Sizable industries have built up around Content Analytics and Enterprise Content Management. But to consider the problem at its simplest level, I offer this insight: Through the capture of metadata at the point of data recording, and through subsequent programmatic analysis of stored data, as illustrated in Figure 10.9, we can begin to teach computers what the data we store represents.
Systems that store data without any understanding of what it represents in the physical/human world will be less powerful and versatile than those that we can train to understand our world. The emergence of Large Language Models (LLMs) like ChatGPT in late 2022 show a promising trajectory here, but also carry great societal risk as they only understand human language, not mathematics, logic, science, or how to distinguish truth from opinion, all of which will be critical if accurate, reliable and trustworthy human-centric life interfaces are to be developed.
Machine learning technologies and Artificial Intelligence have pushed machine understanding of human words, images and content to impressive levels in recent years and such technologies can certainly be helpful, but in fact at the core what we are talking about here is something much simpler than AI; It is simply about automatically labelling datapoints in as many different ways as possible (using a similar principle to lifelogging [2.2.3]) so that those datapoints can be associatively retrieved from many different angles, and providing humans with ways to amend incorrect labels and to reclassify data or apply new semantic associations. Such approaches are in their infancy, and have not yet been adopted extensively in commercial settings. Issues of interoperability for PDS systems are being actively explored and developed in the Solid community [Berners-Lee (2022); Bansal (2018); 9.2] in pursuit of a decentralised web (Verborgh, 2017).
In this principle, which again looks at a more sociotechnical design perspective, I will explain how one person can apply the discovery-driven activist approach to compel a multi-billion-dollar international data-centric organisation to improve their HDR.
As an avid user for several years of the music streaming service
Spotify, I have built up a large library of playlists. I was interest to
build an app using my listening data, so made a GDPR request to get a
copy of my personal data. When I received that data, I was disappointed
to find it was not suitable for programmatic use, because the tracks in
my listening history were identified not by any unique identifiers such
as spotify:track:4cOdK2wGLETKBW3PvgPWqT which I could use
to construct clickable song links, just by freeform text strings.
Through a long and complicated saga, explained in detail in ARI9.1, which involved much persistence and sending
over 30 e-mails in an eight-month period, I was ultimately successful in
getting Spotify to improve the format of their GDPR data returns,
not just for me but for all customers who make GDPR requests in
future. I had proven that one individual can use their GDPR
rights to exert power over a corporation, with persistence.
A larger scale example of individuals forcing giant corporations to change is seen in the case of Facebook. In the early 2010s, Austrian lawyer Max Schrems began to pressure Facebook to disclose more personal data to their users. He created a tool to enable people to make their own data access requests, which over 40,000 people used. Faced with an overwhelming volume of work and massive liability of future data access requests, Facebook was forced to launch the self-service Download Your Information (DYI) download tool, increasing transparency for all Facebook users worldwide (Solon, 2012). Facebook was forced to increase its transparency further when Paul-Olivier Dehaye (now CEO of Hestia.ai) made a GDPR request (later backed by legal action) to force Facebook to disclose more information about which advertisers Facebook had enabled to target him using the Facebook Custom Audiences feature. Apparently in order to avoid being embarrassed in court, Facebook updated DYI so that every user’s downloaded information includes a list of advertisers who have added you to a Custom Audience (Dehaye, 2017). Dehaye and Schrems both continue to act as HDR reformers and civic hackers following the discovery-driven activism approach, through their organisations Hestia.ai [See Section IV Introduction] and privacy rights organisation noyb.eu (‘none of your business’) (Schrems, 2017) respectively.
Continuing the sociotechnical design focus, it is important to recognise that groups of individuals may be able to achieve more than individuals can on their own. Increasingly, the Internet that each individual experiences is not the same as that experienced by anyone else. Thanks to recommendations, targeted ads and social media feeds personalised to your interests, no two people will see the same digital reality. This makes it much more difficult for regulators or individuals to hold digital service providers to account than their analogue counterparts. In recent years, many activists have embraced the power of collectives to try and fill this regulatory gap, realising that together, they can discover far more than they can alone, and that through such collaboration there is an opportunity to improve awareness of digital providers’ practices.
An example of this is the WhoTargetsMe project, launched in 2017 (Jeffers and Webb, 2017). The objective of this project was to monitor political advertising in the UK. Recognising (as larger studies have shown (Bakshy, Messing and Adamic, 2015)) that everyone was seeing different advertisements, the goal was to have each individual report what adverts they see on Facebook, so that these can be pooled and compared with others. Over 50,000 people participated, building up an otherwise unavailable picture of the ways in which different political demographics were being targeted. This is a powerful mechanism available to collectives in this space: the ability to have individuals obtain their own datapoints and then compare them.
Another example is seen in the Worker Info Exchange (‘Worker info exchange’, 2022), a collective that helps gig economy workers such as Uber drivers and Deliveroo riders to make data requests. Using the pooled data, they conduct investigations to understand algorithmic inequalities and identify unfair treatment of worker by employers. They then help those workers to fight for better working conditions, much like a traditional trade union, but powered by collectively-sourced data. This resulted in Uber being taken to court, and some gains being made for drivers (Lomas, 2021; Foucault-Dumas, 2021).
As the aforementioned case with Max Schrems showed [Principle 9], collectives can be particularly powerful when exerting their data access rights en masse, and this can improve HDR and force greater transparency. René Mahieu and Jef Ausloos have published an exhaustive list of collective actions taken using GDPR rights, addressing issues such as discrimination by US colleges, corporate surveillance of climate activists, identifying gaps in data disclosures, and manipulation of users on dating apps (R. Mahieu and Ausloos, 2020). The authors identify that the GDPR provides an architecture of empowerment and have called for better enforcement and for European authorities to provide better support for the ability for collectives to make data access requests together (R. L. P. Mahieu and Ausloos, 2020). Hestia.ai’s digipower investigation [See Section IV Introduction] concluded that data-discovery driven collectives are a vital step on the road to a more digitally empowered society (Pidoux et al., 2022, p. 70). It is clear that organised collectives exploiting data access rights represent a powerful vector for impactful discovery-driven activism.
Having identified the need to assign every piece of a user’s life information to a particular partition (or multiple partitions) of their life, it quickly becomes apparent that this would be too much work for the user to do alone. Systems that use manual categorisation and tagging to classify information work best with a large userbase to contribute effort to the classification operation (Golder and Huberman, 2006). As part of the explorations of PDS approaches at BBC R&D, I therefore also examined how this challenge might be addressed (considering also that effort could be a deterrent to adoption [Objective 5 [8.5]]). I identified an approach that could help with this problem: If the entities (for example, a person, a place, an event or a topic) associated with a piece of data can be programmatically identified, then a lot of the assignment of data to life partitions can be handled automatically. For example, association with your office location would indicate that any data associated to that location is likely to relate to the ‘work’ part of your life, and this could be done automatically, reducing the effort for the PDS user. The process of identifying entities within data, known as entity extraction or named entity recognition (NER) is a well-established technique, which relies on the trained recognition of proper nouns and keywords combined with the statistical analysis of sentence grammar (Marshall, 2019). This technique is used extensively in text-mining products within the Content Analytics industry such as those produced by my former employer, OpenText (formerly nStein) (‘What is text mining and content analytics?’, 2022). However, in the context of a PDS, I propose that new techniques can be applied, making use of the data touchpoints into different parts of an individual’s life to identify entities relevant to them personally (including, for example, names of friends or private projects that a standard NER solution would not detect). Data is full of references to entities that have personal relevance in your life. Finding these allows meaningful metadata to be attached to each datapoint. Figure 10.10 shows how a large number of entities could be detected from different parts of an individual’s data once it has been imported into a PDS environment:
This sort of approach could be quite powerful in reducing the effort for life interface users. By scanning the data, the most prevalent entities could be identified, and the user need only assign the entities to different parts of their life, as illustrated in the first two frames of Figure 9.10. This would then allow hundreds of associated data points which had been programmatically associated to that entity, to be assigned to the correct ‘bucket’ or life partition. I was able to prototype this technique successfully to prove the concept, as described in [ARI7.1].
While such an approach would not be perfect, and there would need to be some corrections made by the user, this is far preferable to them having to provide all the classifications and is likely to motivate greater engagement. I have observed in user experience design and consideration of productivity systems that users are more motivated to correct errors, than to fill in a blank page.
Philosophically, we are moving here towards a learning system, a system that can be told when it is right and when it is wrong, and get better at classifying things correctly, analogous to the way an executive might train an assistant to anticipate his/her needs better, a sort of digital life assistant (Bowyer, 2018). Bayesian classification techniques could also be used to help with the learning here (Authors, 2022). This approach is also useful for ecosystem detection–as outlined in Principle 4–as identification of relationships with external entities is a key first step to mapping a user’s ecosystem.
INSIGHT 12: The ‘Seams’ of Digital Services need to be Identified, Exploited and Protected.
As identified in 8.4.1, product design (be it hardware or software) is political. Designers pass some power to the user through their design, but also, users should be able to take some power on their own terms. This is the case made by Cristiano Storni in his 2014 paper on the politics of seams, Cristiano Storni identified the idea of empowerment-in-use which advocates the idea that people need to appropriate their technologies to different uses that the designers may not have foreseen (Storni, 2014). This is blocked by current black box, limit-what-the-user-can-do thinking. Central to this capability is the concept of seams - those exposed areas which the user is free to change. This concept was proposed by Mark Weiser and developed by Chalmers and others (Weiser, 1994; Weiser and Brown, 1997; Chalmers, MacColl and Bell, 2003). Changes such as closures of APIs or removal of ports [8.4.2] can be seem as the removal of seams. As Storni highlights, the availability of design seams is a critical determiner of user power. Companies gain power and reduce agency when they remove or restrict activity at seams. It follows that by identifying, exploiting and protecting the seams of digital services and devices, user autonomy and the viability of data-unification efforts can be protected.
An unseen battle is for the free flow of information is underway at the seams of today’s digital products.
Hackers, civic activists and makers seek to repurpose and exploit the edges of products for their own means, while digital service providers and platforms try to block such activity. For example:
These examples make it quite clear that Storni was right: product seams are the place where control can be asserted or regained. They are the setting for an ongoing battle for the freedom and integrity of today’s information landscape, and it is important for HDR reform that this space is specifically targeted. The role of the HDR reformer here is twofold:
In this context, the work of whistleblowers such as Frances Haugen (Horwitz et al., 2021) and Edward Snowden (Macaskill et al., 2013) is particularly validated and important. Whistleblowers can expose internal practices that harm the information landscape’s integrity that are not otherwise visible. In order to hold online platforms to account, the public must be aware and able to attribute any restriction in freedom or information access to the correct source. They need to know that the information or functionality is being modified or restricted. These ideas are explored further in (Bowyer, 2017). Seams should be much more in the public consciousness than they are.
As outlined in 8.5 and in this section, it is not just possible but essential that work is done to persuade data-holding organisations of the benefits of moving towards the new paradigms outlined in this thesis. The following avenues for possible future research and advocacy toward data holding organisations have been identified:
The 13 principles that have been asserted in this section are by no means the only principles that guide the HDR agenda, nor are they necessarily the most important. However, they are the ones that I have been discovered, understood and refined through the pursuit of the six human needs [Chapter 6] through my embedded involvement with the four projects [See Section IV Introduction] as well as self-experimentation and prior design thinking. They are located here within the conclusion of the thesis because they constitute the most tangible and actionable learnings of the thesis; the first steps that will guide an HDR reformer as they embark on the pursuit of the HDR agenda in their design, research, transformation or innovation work. The remainder of this chapter will reflect upon the challenges of pursuing that agenda and the legacy, impact and limitations of this body of research.
This section will reflect upon the proposed objectives and approaches towards better HDR laid out in Sections III and IV, from a critically pragmatic and broad perspective. One of the first things to note is that while Chapter 8 identified many of the clearest obstacles to the HDR objectives, it should by no means be considered exhaustive or complete. It is important to recognise the uphill struggle that faces anyone pursuing an HDR agenda, and that there are many forces working against the HDR reformer, including but not limited to commerce, resistance to change, insufficiently effective regulation, insufficent funding, technical challenges, disinterest by almost all parties, and the normalising effect of the status quo. HDR requires not just a halt of current data-centric trends but an about-turn and pivot to a completely different model of organisational personal data handling. In the Principles, insights and Approaches presented in Sections IV and V through which I attempt to operationalise the HDR agenda, I lay out a direction of travel, highlighting the actions that appear to be the best chances to succeed, not a complete answer. The HDR agenda is explored both from within the system, and from outside the system, and it is possible that even with parties pursuing both conformist and activist approaches in parallel that this may not be enough to bring about the desired change. Nonetheless, the cause of better human-data relations is clearly a worthwhile one for society.
The challenges of better Human Data Relations are intrinsically sociotechnical. They cannot be solved by technological design, by law or by social change alone. This was one of the reasons why it was important to lay out a variety of approaches in the recommendations from the Case Studies in Chapters 4 and 5, and in the HDR approaches in Chapter 9. One benefit of this, however, is that practitioners from many different fields ranging from technologists and designers to journalists, activists and civil servants should be able to find something in the Principles or Approaches that they can build upon. One of the underlying needs identified in the findings of Case Study Two [5.6.1] but also where Chapter 9’s approaches hope to influence governments, is that changes in law are required. Like other jurisdictions, both the EU [9.5] and the UK are, at the time or writing in 2023, developing new or changed laws that will affect the use of personal data online; but their interests are not only to protect their citizens, but to help businesses too, not to mention any political interests of those in power. As such, we cannot expect that all legal changes will align with the HDR agenda; in fact, in the UK there are serious concerns that new data protection laws set to replace GDPR following the UK’s exit from the EU could ‘seriously weaken data protection rights’ and cause harm to marginalised communities (Cowburn, 2023). Even the existing GDPR laws, which are designed from a more individual-centric perspective, fail to deliver sufficient benefits in practice, as Case Study Two showed [5.5.1]. So, it is important to acknowledge that while regulation is one of the most promising routes to achieving better HDR, it may not happen and it may not be sufficient even if it does happen. It would also require a more paternalistic attitude to protecting individuals, which goes against the libertarian trends of the early 2020s political landscape in the West.
In 1.2.3 and 9.6.1, I highlight the need for better education and data literacy around Human Data Relations: the need for better understandings about the intrinsic personal value of data as a source of life information, and the need for better awareness and capability when it comes to exercising data rights and recognising exploitations of power or diminishments of individual agency. This should not be taken as a criticism of the public for not knowing enough, nor does it imply that it is the responsibility of ‘the people’ to demand fairer HDR from providers. We need better data literacy at all levels of society, and perhaps particularly amongst policymakers, public and civic service providers, and politicians, as well as journalists and public figures of all kinds. By targeting those individuals who shape future legislation and have the greatest capacity to influence business and societal norms as well as public opinions, we stand the greatest chance of creating impact. So far, the problems of disempowerment over personal data represent a systemic failure to recognise the negative impacts upon individuals and act in the interests of individual liberty across all aspects of society—including technologists, lawmakers, and media. If the issues are properly understood and pursued by those in positions of power and influence, which is not easy given their complexity, then it should not become a burden upon citizens to become aware of injustice and demand better HDR, but rather the system would already be changing to meet their needs through the actions of those who shape the systems and processes of society. This is why projects such as digipower [See Section IV Introduction], which aimed specifically to raise awaress of digital power issues among influential European VIPs, and participation in policy consultations by experts such as my contribution to the GDPR Guidelines consultation [9.5.3] are vital. HDR reformers will need to continue to educate and influence those who are best placed to change society.
The issue of individual consent to data sharing has been covered at length in this thesis. In the pilot study [ [Bowyer et al. (2018); Appendix A] and Case Study One [4.5.1], the framing of consent is used to consider whether or not individuals feel sufficiently consulted in matters pertaining to the use of their data, with the conclusion being that dynamic consent is required, and that informed consent produces a ‘point of severance’ beyond which individuals lose control. However, it is important to recognise that consent is just one of the lawful bases for data processing (‘A guide to lawful basis’, 2022), and that data holders may decide to rely on a basis such as ‘legitimate interests’ or ‘fulfilling a contract’ which reduce the requirements upon the data holder. In particular, the former removes any ability for the individual to object to processing or to have data erased. There is a trend to move away from consent, as TikTok recently attempted to do, and as was rumoured by a local authority in the Early Help context [8.4.2]. It is important to recognise, therefore, that this is a possible vector through which data holders might act against HDR and reduce individual data rights over the personal data they hold in future. Just as with the proposed changes to UK data protection law mentioned above, this would represent a serious diminishing of individual agency, and the ability of activist and research groups to investigate their personal data [9.3.1] would be crippled by such a change.
The reliance on personal data rights is also not a universal right that covers all cases in which data is held about individuals. Personal data is a tightly-defined term relating to data that does or can identify individuals [ARI2.1]. Holders of anonymised, pseudonymised or aggregated data, or of data which from which individuals are not easily identified (for example crowd CCTV recordings, traffic flow data or Large Language Models (LLMs)) commonly argue that rights such as GDPR do not apply to their datasets. Such datasets fall into legal, technical and moral grey areas. People still feel a sense of ownership over such data, and perhaps rightly so. For example, Apple—lauded for its improved approaches to privacy—states in its own privacy policy that it may keep information extracted from Siri voice recordings but only once they have been de-identified and ‘attached to a random identifier’. This means that content of that voice command is held by Apple and may still be revealing or personal, but is now no longer accessible or erasable by the individual nor subject to any data protection law. Several researchers have shown that anonymous data can be ‘re-identified’, in one study by using as few as 15 points of data (Bradbury, 2019). As such, we can see there is a gap in individual rights and protections over some types of data. Morally, individuals still deserve a relationship to that data and the information within it. Whether this moral claim can be actualised in law or technical capability remains to be seen. The findings of Case Study Two would suggest that such an approach, once understood by the individual service user, may harm their trust in their provider. Furthermore it could have the potential to generate feelings of having no choice [4.4.3] or of powerlessness [5.5.4].
The principle of unifying and uniting data and hence life information [Principle 2] would clearly be of significant value to individuals in their quest for better HDR, allowing patterns and behaviours across devices, platforms and services to be visualised and reflected upon. However, it may be in some cases unachievable due to technical, legal or corporate policy reasons. An example of this was experienced in the SILVER project’s attempts to build a health data interface for Early Help Support Workers [see Section IV Introduction]. Prior to the building of the prototype, support workers reported they experienced the sense of a ‘wall’ between different public sector agencies, especially between healthcare providers and social care providers. Employees of healthcare providers felt they would be violating internal policies and potentially law if they made personal data available, even if it would be in the individual’s interest to do so. And similar barriers were encountered when trying to build technical channels for data-sharing. Even once access to the requisite APIs was acquired, local authority and local health authority legal departments debated and negotiated at great length over how to establish suitable data-sharing agreements. This made development particularly difficult, as we had to build user interfaces for data that we could not access, but could not access the data until the system was built. Extrapolating this problem more generally, it is clear that (particularly in the public sector) various agencies consider (rightly so) that they have duties of care not to share individuals’ personal data with anyone under any circumstances, even if the access would be programmatic and then limited to staff who shared a duty of care to the individual or had a need to know. Clearly there is a question of service provider liability when it comes to certain categories of public data. We could also imagine similar challenges in allowing any kind of access to commercially held data, even via API, in areas of sensitive data such as therapy, welfare services, dating sites or legal or professional personal services. These challenges make it especially hard for third parties and developers to help individuals to unify and unite their own data, because they cannot (even programmatically) access personal data on the individuals’ behalf. Better data standards [8.5.4] and proxy representations [Principle 4] may help technically, but we can expect much resistance from organisations with strict policies about access to user data (which, in the current litigious landscape, are quite common). Additionally, it is important to recognise that the uniting and unifying of personal data may present additional risks, for example where linking data enables identification that was not previously possible, or breaks expectations of privacy between individuals in a multi-party data set such as that for a family or household. Revealing the provenance of information may also break expectations of privacy in some contexts. Any systems and processes that unify and unite data or reveal its origin must therefore be design with the utmost attention to both individual and intrapersonal privacy, in line with privacy by design (Cavoukian, 2010) thinking.
The pursuit of better HDR is fundamentally a design problem. The designs it demands, however, may not always be implementable in practice. For example, some companies simply do not provide APIs to access the data that people want to access, nor are they compelled to do so. As experiences with Cornmarket’s PDS prototype showed, building user experiences that enabled people to obtain data via GDPR requests and download portals and then upload them into the PDS proved not only technically difficult and hard to design, but these user experiences were very complex and unsatisfactory to users, and worse still would need to be repeated whenever an up-to-date view of data was required. As Case Study Two suggests [5.6.1], something that would help greatly here is a recognition by policymakers and data holders, that one-off, manual access to data is inadequate, and that really what people need (both programmatically as data and in human-readable form as information), is a live view or feed of their data. This could be integrated into Human-centric life interfaces and ecosystem viewers, which would become easier to design, build and use. This would represent a far less cumbersome user experience. Interestingly the desire for a feed of data changes also surfaced in Case Study One [4.4.3] as something that would be valuable to both parties, validating the need to view data as dynamic and changing rather than static. It is clear that to build workable solutions in these areas that meet user needs will require multiple specialisations: in data rights law, in user experience design, and in specific domains such as commercial data holding or social care, and it is hard to see how these changes could be co-ordinated by any one party. It will be especially difficult for activist researchers, especially when efforts to build alternative ways to access platform-held data feeds can carry costly personal penalties, as in Louis Barclay’s case [Principle 12; Barclay (2021)].
The technical and design challenge of building human information interfaces is a ‘grand challenge’ (Lufkin, 2017) level of problem. To build what is in essence an operating system that deals in life information and ecosystem information concepts rather than files and data records is, to put it mildly, non-trivial. That does not mean that is not necessary or worthwhile to engage in this endeavour though. Information modelling, such as that described in 9.4, could help, as could semantic information standards [8.5.4] and automatic entity identification [9.4.3]. Since a key part of such interfaces relate to both human and machine having a greater understanding of information with data, past guidelines on legibility [2.3.2] and effective access [2.1.4], as well as all the lessons from the PIM space [2.2.2] about how people need to interact with personal information, are also important. It may also be possible to take lessons from the explainable AI space, where a report has identified four principles: Explanation, Meaningful, Explanation Accuracy and Knowledge Limits (Phillips et al., 2020). However, the understanding of such explanations should equate to the acquisition of technical expertise. As with explainable AI, being able to explain a highly complex technological construct is not useful to an individual if it requires them to gain expertise and interest in the technology itself, because this creates issues around technical skills and whether or not the individual is inclined to take an interest in technology. As Mark Weiser envisioned (Weiser and Brown, 1996), technology should ideally recede into the background of our lives. Thus, explainability and understandability should also be very much be anchored in the human side of human data relations, rather than in the technology and the data. The focus should be on bringing the technology into the human mental landscape, not bringing the human into the world of the computer (and even when learning about ecosystems, which might seem to go against this, explanations can be presented primarily in terms of companies that the individual has relations with, and third party entities, rather than servers and databases etc.).
Perhaps the most challenging practicality of shifting towards a human-centric paradigm for data and information relations, relates to the amount of effort that is currently required for people to take charge of their digital life. Even a motivated individual such as the author of this thesis this finds it very difficult to achieve a meaningful awareness and control over their digital life. A further problem is seen that in many cases, the motivation is not there, both in terms of apathy or powerlessness as mentioned above [4.4.4; 5.5.4], but also in that seeking better Human Data Relations is currently hard work that few are inclined to do. It is easier to use the system as a consumer than to participate in one’s digital life as a citizen (Alexander, Conrad and Eno, 2022). There is work to be done in winning hearts and minds [9.6]. However, the objective of the Human Data Relations agenda is not to place an undue burden on the citizen. The identified need for improved data literacy [9.6.1] applies particularly to those with the power to influence policies, technologies, civic or commercial structures, processes or public opinions, so it is important that they be persuaded act upon this new understanding in their professional capabilities not just through their role as users of digital technology. Another way of thinking about the the issues of digital power over data is to compare this to another complex issue: personal finance. It is in every individual’s interest to acquire utility and control over their personal finances, but this does not require every individual to take an interest in tax law or become an expert in accounting ledgers. Society has responded to this need by the production of specialist service providers–accountants and financial advisors–who can help individuals who do not wish to engage with the detail to manage these aspects of their lives. Self-service budgeting and tax software is now available as well for those with lower budgets. This is the model that a world of better Human Data Relations could also adopt. Already data access and understanding services are emerging [9.3.4], and this sector will inevitably grow as demand and investment increases. As with financial responsibility, or environmental impact, or personal safety, there is a certain amount that individuals can do for themselves, but the greater impact and moral responsibility to really bring about change lies with lawmakers, regulators and the organisations that create the infrastructure of society, be they in the private sector or public sector. The approaches presented in Chapter 9 are only first steps the beginning of possible trajectories to improving society towards a more human-centric and individually-empowered one. Of all the quadrants for change [Figure 9.1] it is the external ones that are most critical. The structures of society must change (bottom-right quadrant), and only when this occurs can people be offered a meaningful choice. Only when people are able, and inclined, to take that choice, between the data-centric status quo and the human-centric future, can our human data relations–the way we live our lives in a data-centric world–actually change in a meaningful way (top-right quadrant).
This section will examine the limitations and future potential not of the HDR agenda, but of this thesis and this body of research itself.
Methodologically, this research achieved significant success in helping participants engage with the world of data. The use of data cards Figure 3.7]] throughout the Early Help context was particularly successful, because it converted dry data concepts into relatable life information concepts, but also because it gave participants boundary objects and things-to-think-with [Bowyer et al. (2018); Brandt and Messeter (2004); 3.4.2; 4.5.2]. The use of storyboarding cards [Figure 3.11; ARI4.3] to explore two-party interactions in place of more traditional UI-sketching approaches was also novel and effective. Such card-based methods for engaging people with data could be developed and explored further, and indeed there is clearly still potential to do. This was not done within this PhD in part because the COVID-19 pandemic directed the research trajectory away from in-person work, and additionally the availability of the GDPR enabled more data-focused approaches. In fact even though it did not use tangible data cards, the structured conversations around aspects of one’s own life data used in Case Study Two and subsequently in the digipower project [See Section IV Introduction] proved effective and would also be worthy of further exploration as a methodology [1.2.2]. Any and all of these methods could be developed further to produce useful toolkits for educating people about data policy and law, for example in helping them understand proposed changes to UK data processing in health care with the confusingly-named GPDPR (Shemtob, 2021). Another methodological limitation can be observed with regard to the use of GDPR rights in the research; The methods employed in Case Study Two purposefully did not explore was to measure how well participants could carry out their GDPR rights without aid. The process was a guided one, and some participants freely admitted that without aid they might not have been able to commence or conclude the GDPR processes. The design of this study took a value judgement that there was greater benefit in educating individuals about GDPR practicalities, and in assessing companies against a level playing field, than in measuring GDPR education levels in the public. Such a study would be worthwhile and could highlight areas for companies and regulatory bodies to improve their public engagement around the use of data access rights. Desite it being a guided journey, great care was taken not to influence or bias participants’ judgements around GDPR, which is why education and questioning focused on participant evaluation of the GDPR experience against privacy policy claims and GDPR rights, rather than against the ideals of the emerging research agenda.
As described in 3.2.2, this PhD used an unstructured process of feedback between Case Studies, individual experimentation, modelling and synthesis work, and the project placements. This means that (as mentioned in footnote 3) the study designs and overall research approach largely did not report findings back to participants to acquire additional feedback, unlike more formal Participatory Action Research approaches (Lewin, 1951; Chevalier and Buckles, 2008). Given further time and resource, additional work could have been carried out to include participant reflections upon findings or design critiques of HDR proposals into this body of research. However, this does not mean that findings of the studies were not validated. In general, evolving findings and learnings were validated with participants in different groups or settings. As Chapter 6 shows, there were strong correlations between the Case Studies; each helps to validate the findings of the other. Within Case Study One, the Sentence Rankings exercise [4.3.5] served to validate findings from earlier research by the author and by the SILVER project with participants. In the case of Case Study Two, the digipower project [see Section IV introduction] (which used a very similar methodology) helped to validate participant experiences from the study. The HDR agenda evolved through regular exposure to real-world settings in the BBC R&D and Hestia.ai placements, which added a further feedback and validation cycle of the discursive contributions of the thesis. Every element of this research has involved engagement with members of the public to inform the evolving designs, in line with UCD [3.2.1] and design justice (Costanza-Chock, 2020) philosophies. Another opportunity would be to conduct further workshop-type activities involving parties representing individual, collective and establishment points of view. One way to do this would be to take the findings from the industrial track of the thesis back to the stakeholder groups from the participatory track. More generally, such multi-party investigatory research continues to take place in organisations such as BBC R&D, Sitra, Hestia.ai and The Citizens as well as many HDI-focused academic research labs. Where there is a significant opportunity for further work would be in engaging HDR-adjacent activist and grassroots groups and movements such as those listed in 7.6. It is important to acknowledge that this researcher has consciously engaged in activist movements around HDR reform through his engagements with Hestia.ai, The Citizens, and the Open Rights Group [9.3.2], not just to serve the research agenda of this thesis but also due to personal interest in better digital rights, which could faciliate accusations of bias. As a counter to this, we can note that such engagements have been balanced against the author’s lifetime of experience ‘within the system’, working for commercial technology providers of various sizes including IBM, Open Text, InterRail and Logica. Furthermore within the timescale of the PhD, there were also multiple engagement with activities ‘within the system’ in such as through SILVER/Connected Health Cities and at the BBC. In the SILVER context, the collision of traditional data-centric approaches of the establishment and the researcher’s human-centric agenda sparked significant debate and reflection among the research team. Established service-provider-centric values were inadvertently embedded into the project’s solutionistic framing resulted in the production of a health interface [see Section IV introduction] that favoured staff perspectives over supported family desires uncovered in the project’s own research. This shows that even within this PhD, there has been extensive reflection and challenge to the ideas from different perspectives. Based on these reflections, every effort has been made to ensure that the approaches in Chapter 9 and the Principles can collectively support use within or outside the establishment.
As laid out in 3.1 the research approach of this thesis is unashamedly aligned to an individualist philosophy of personal betterment. As such, it is important to recognise its gaps around exploration of collective sensemaking of data and the identification of data-related objectives that further group or community interests over individual ones. For example, Maskell’s Spokespeople project explored how communities might share geolocated data from their cycling journeys to pressure local authorities to improve cycle lane provisions (Maskell et al., 2018), and Elsden’s Metadating project looked at how couples might understand and reflect upon their romantic lives through personal data (Elsden et al., 2016). The issues of complex interplays of different individuals’ personal data within families and households was present in participants’ experiences in both Case Studies, but was not actively explored within this thesis. Indeed it could be argued that some of the HDR principles might create tension when viewed from a group perspective, for example the knowledge of provenance of data [Principle 5] could have implications for lost anonymity within households or communities. Activist groups such as Hestia.ai and noyb.eu have explored and continue to explore how collectives can work together to pursue their interests and rights through data, but this is limited by the individualist framing of the GDPR and its equivalents. As an example, there is no means for a community or group to request its collective data from a provider, and every individual must make their own request. While GDPR does provide mechanisms for third parties such as data understanding services or lawyers to make data requests on multiple individuals behalves, experiences at Hestia.ai showed this is rarely well-supported by data holders. Indeed in this regard the focus on GDPR as a data access methodology may be limiting, as it risks brushing over the fact that there is a gap in law around the protection of group data that needs to be confronted. Collective data management is not well accommodated for in system designs or policy designs today (Baker and Karasti, 2018). Another aspect of this limitation is that much data today is inherently social and interpersonal, such as social network graphs, family media accounts or shared banking and retail accounts. There is a personal data (and thus human information) ecosystem within individuals’ own lives as well as the external ecosystem held by providers. The matters of collective privacy (Taylor, Floridi and Van der Sloot, 2016) and collective harms through data such as through algorithmic profiling (Tisné, 2020), and the exploration of collective rights over data (Mühlhoff, 2023) are worthy of further study.
In this thesis, a deliberate decision was taken to refer to the distinct legal concepts of data processor and data controller collectively as data holder [ARI2.1]. This was useful for simplifying explanations and was easier for participants to understand. However, it has the consequence of brushing over some important distinctions. A data processor carries out processing of data on a data controller’s behalf, and hence does not bear the same legal responsibilities for data protection (for example, they are not required to satisfy Subject Access Requests). As with the decision to use a legal basis for data processing other than consent, companies can use the concept of joint controllership to declare their data processing partners, or sometimes the data subject themselves (Edwards et al., 2019), as a data controller. The digipower project found evidence of at least one retailer declaring hundreds of joint controller relationships with minor and unheard-of tracking and cookie companies, making an unmanageably complex picture for users (Bowyer, Pidoux, et al., 2022). The declaration of joint controller relationships can result in a situation that where it is difficult to clearly delineate responsibility (Colcelli, 2019), and can create additional burdens for smaller companies (Loon, 2018). These complexities can make data rights become very difficult to apply, such as Internet-of-Things [IoT] environments (Mill and Quintais, 2022). The case of doorbell cameras has become especially complicated after court rulings have determined that owners of doorbell cameras can be considered data controllers and thus liable to large fines (Milmo, 2021) if they cannot satisfy GDPR requests correctly (something which a regular citizen would find very hard to do). This has raised serious questions over whether home IoT and home surveillance technologies are empowering and viable for their stated purpose (Kelly, 2023). The net result of all of these issues is to make a situation that is already complex and hard to understand for individuals become even more so. There is significant scope for future work to explore the complexities of data relations between organisations, especially with regard to the differing legal terms and responsibilities. The exchange of personal data between data brokers is extensive and particularly poorly understood and under-regulated and deserves scrutiny (Keegan and Ng, 2022).
At an ideological level, the design thinking around how to improve data relations in line with identified wants [Chapter 6] are heavily influenced by human-centric idealism, as shared by PDE and MyData practitioners [2.3.4]. While human-centricity is a noble ideal (from an individual or social good perspective), it is important to acknowledge its limitations. Coulton and Lindley argue that a focus on the relations that a human has with his or her world is reductive, and that better ‘More-than-Human-centric’ design could be achieved by focussing on a metaphor of constellations and a recognition that focusing only on the human perspective can cause aspects of the design context to be overlooked (Coulton and Lindley, 2019). Elisa Giacardi argues that a model where we consider the co-performance of humans and automated systems could unlock new design thinking (Giaccardi and Redström, 2020); indeed it is clear that with the emergence of LLM-powered chatbots such as ChatGPT, such a paradigm shift in HCI design thinking will be vital. Wakkary et al. consider that humans and technologies are fundamentally entangled in complex relations which require the use significantly more advanced philosophical lenses to be effectively designed for (Wakkary, Oogjes and Behzad, 2022). Pragmatically, however, simple human-centric visions such as the MyData principles (MyData, 2017) and the Human Data Relations agenda of this thesis, do seem to be useful mental models and sensible ‘first steps’ for research and design innovation as we move beyond data-centric and human-computer interaction thinking. The more advanced mental models of human-centricity’s critics will surely be useful once some of the first steps toward human-centred personal data ecosystems have been realised.
In considering the possibilities for future work in improving people’s relationships with and through data, the Human Data Relations agenda can be considered as a design challenge at multiple levels. It could be used by interface designers to consider how to better represent data as meaningful life or ecosystem information to users [7.4; 9.4.4; 9.4.5]. It could be used by user experience designers to rethink how existing interaction journeys can become more engaging, empowering and have a lasting value to users [4.4.2; 5.6.2]. It could be used in service design or business process design to consider how service relationships can be redesigned to be more involving and inclusive, or to achieve greater levels of data accuracy, data use consent or improved customer loyalty [4.5.3; Principle 13]. It could be used in the design of law or policies to give people more effective digital rights [5.6.1]. It could be used by public service providers in the design of better public engagement programmes or social interventions 4.6. It could be used by activist groups to design campaigns or initiatives that might have greater impact [9.3; 9.5]. It could even be used by individuals to design better systems (in other words, to make different provider and app / workflow choices) for managing their own digital lives [5.6.3; 9.3; 9.6.1]. All of these areas and more are suitable for future design and research work building on the kernel presented here in the Human Data Relations agenda.
As an experienced software engineer, power user and technology blogger who had considered the loss of digital agency for many years [1.1], my journey into this research space was an unusual one; I arrived with already-formed ideas about the nature of the problem. This was not an ideal match for the traditionally participant-led approach of HCI, where ideas and insights normally arise solely from one’s participants. However, through the discipline of the Digital Civics programme, the process of publishing peer-reviewed papers. Recognising that HDR issues would be unlikely to surface organically, I was able to use careful sensitisation [3.4.1], balanced and open questioning and neutrally-designed stimuli [3.4.2] in a way that elevated participant experience to be the primary source of data, to produce findings and discursive conclusions that are as much the participants as my own.
Along the way I discovered vital areas of literature and existing work, most notably the foundational work of Weiser, Abowd, Crabtree and others [2.3.1; 2.3.3], the sub-discipline of Human Data Interaction [2.3.2] and the emergent innovation around Personal Data Ecosystems and MyData [2.3.4]. Collectively through these discoveries, I solidified my existing understandings and was able to contextualise my evolving learning against the established research landscape.
As my understanding from the Case Studies coalesced into a clear, cross-validated understanding of what people want from data and from data holders [Chapter 6], this gave me the confidence to grow and evolve as a researcher; moving from investigatory or theoretical research towards activist exploration of how to work towards delivering these new capabilities in practice, enabled by the models and ideas I developed. This ultimately gave me the confidence to recognise that, in this body of work, I had identified a newly emergent design philosophy, that deserved to be named, scoped, and explored—the research agenda for better Human Data Relations.
I was especially lucky to find peripheral activities, especially with the BBC and Hestia.ai, that fitted so well alongside my research agenda. These activities slotted perfectly into the action research cycle [3.2.2; Figures 3.1 and 3.2] of my thesis, producing a powerful feedback loop where findings from the Case Studies became immediately applicable to real-world design activities, while experiences of the real-life barriers to pursuit of the HDR goals helped to challenge and evolve the theoretical models (such as shared data interaction) and designs emerging from the Case Studies. The project contexts provided a place where emerging qualitative findings and design ideas could be tested and iteratively improved through attempts to operationalise them, producing well-grounded and actionable learnings.
This dual research-and-practice approach has allowed me to push this thesis further than a traditional HCI study would allow, and underpins the two-track research approach of this thesis, where in the secondary project work track I stepped out of the traditional researcher-as-observer stance and stepped into taking an active role as an expert in user-centred design (UCD) [3.2.1], adversarial design [3.2.1] and practical software interface and process design and innovation.
It has been a tremendous privilege to spend over six years examining in great detail the nature of the problems facing our data-centric society, to translate those impacts into tangible needs as an understanding of the problems of the power imbalance over personal data, and to be able to map out the landscape and possibilities for improving the way we relate to data. Through this research, I have discovered rich evidence to quantify and qualify the losses of agency I had suspected prior to beginning the PhD, and in a far greater level of detail than existing research had thus far uncovered. The programme has also given me space to experiment with using both GDPR and web-scraping to access data and push boundaries, to really embrace my role as an HDR activist and adversarial designer [3.2.1; Figure ARI7.1] while also exploring what can be achieved from ‘within the system’ through the work with Connected Health Cities and BBC R&D. It has allowed me to design and prototype new models and views of data and of information which have transformed the way I look at digital information and how we relate to it, in particular:
I hope these models, as well as the other contributions [1.2], can help others to develop their thinking in the same way, to become more HDR-literate and contribute to the crusade for HDR reform that the world so desperately needs.
The collaborative opportunities have been significant. Without this PhD, I would never have had the opportunities to discuss and develop models for personal data interaction and improved ecosystem negotiability with experts at the BBC, Hestia.ai, Sitra and in the wider MyData community. Alongside these formal collaborations, I have disseminated ideas through blogs, tweets, workshop papers and lectures, which has helped to refine and clarify ideas but also to stimulate valuable discussions with interested people to gain feedback that helped develop the models and my own learning further.
This opportunity has opened doors that have allowed me to pivot my career towards putting these learnings into action, working on important projects [See Section IV Introduction] to explore how data interaction reforms can be realised in practice, and how we can become not just innovators but social data activists. Thanks in large part to this research, I am now a Senior UX Designer with the BBC, in a team specialising on data user experience. What started for me as a passion project and a journey in research has become a journey into the design profession. As a designer, I aspire to bring the benefits of this research to the general public through the design of services and interfaces that aim to inform, educate and entertain people in their everyday lives. I now know how to begin to have an impact, how to work on building that better HDR future I and my participants have imagined. It is the journey of a lifetime, and also one that is in many ways just beginning. I hope that my work and this thesis can contribute to a better, more human-centric digital world, and I can’t wait to see where this leads.
This thesis offers five key contributions, which summarised in 1.2:
The constituent parts of these contributions can be easily located using the HDR index, which is located after the Appendices.
Its detailed understanding of individual needs around data interaction and data-centric service relationships [Chapter 6] is backed by participatory action research in both public sector and private sector Case Studies [Chapter 4; Chapter 5], which provides a clear answer to the two primary research questions RQ1 [3.3.1] and RQ2 [3.3.2]: People want visible, understandable and useable11 data, process transparency, individual oversight capabilities and involvement in decision making.
Based on a solid grounding in existing literature, policy and innovation around Data Access, Personal Information Management, Human Data Interaction and Human-centric Innovation [Chapter 2], these needs have been synthesised into a clearly-defined new agenda for future research and innovation, Human Data Relations (HDR) [7.2] which encompasses four clear objectives [7.5] for improving individual agency and societal power imbalances around data:
Sections III-V of the thesis took this body of work much further than a traditional HCI PhD, drawing on the author’s experiences with the practical pursuit of better Human Data Relations in four different real-world academic and industrial project settings [see Section IV Introduction]. Through additional insights, designs and implementation strategies [Chapter 9], the thesis offers not just a theoretical frame for this area of research, but clear and actionable Principles [10.1] that could be immediately explored by researchers and innovators - this thesis becomes an anthology of reference material, designs and strategies for HDR reform. This practical contribution of the thesis is delivered in four distinct parts:
Through its Case Studies, this thesis has made additional contributions to the fields of Early Help and GDPR Data Access, as detailed in 1.2, as well as more generally to any research or design context that deals with how individuals or service providers relate to personal data. The work of this thesis has had demonstrable impact, with methodologies and models from this work having been adopted in projects in the European Union and in the UK. Over a dozen publications, workshops and presentations of the work in this thesis have been delivered [1.3], and the research contributed value to real-world industrial projects at BBC R&D in the UK, Hestia.ai in Switzerland and their client Sitra in Finland, through the sharing of the design Principles and the adoption of models of personal data and methods for data interviewing pioneered here.
In answer to the research gap identified in 2.2.5, this thesis has examined the role of personal data in a ‘whole life’ context. The insights and Principles of this thesis show that personal data should not be an owned commodity or viewed as something static or inert. It is reflective of something real, both in the value it can offer to people’s lives and in the role it can play as their proxy in service relationships. Personal data is ultimately an aspect of the human, and as such to separate or extricate it from that human without their involvement or consent is a violation of individual trust, liberty and respect. Personal data is not a commodity. It is part of a fundamental human right to private life, and this privacy is being regularly violated by the treatment of personal data without sufficient consideration to the individual behind the data. The growth of systems which view data as a resource to be mined, a thing to be transacted with, overlook these moral and socially important human values. It is critical that individuals are given greater agency and control over their personal data, and this thesis has shown that current laws and regulations are, so far, inadequate for this purpose. This research has provided a greater understanding of what is lacking, and what might be done, enabling to move forward in this space.
Through the grounded and detailed references and examples in Section IV, this work moves beyond conducting research to understand human personal data wants, and sets the scene for an progressive and activist agenda to take action in service of those wants, with the objective to reconfigure society to one where those human-centric needs are better met. It constitutes a call to arms for future research, innovation and activism in Human Data Relations, combined with a detailed guide to understand the data economy landscape, what needs to change, and an arsenal of design and implementation strategies for how HDR reformers might fulfil their role as a recursive public [7.6]. Armed with these insights, practitioners of HDR reform can drive us towards a better future to deliver increased agency for individuals, greater data use capabilities, and a more balanced landscape around the use of personal data by service providers across society—in short, a better world for us all.